169 research outputs found
Variational Dropout and the Local Reparameterization Trick
We investigate a local reparameterizaton technique for greatly reducing the
variance of stochastic gradients for variational Bayesian inference (SGVB) of a
posterior over model parameters, while retaining parallelizability. This local
reparameterization translates uncertainty about global parameters into local
noise that is independent across datapoints in the minibatch. Such
parameterizations can be trivially parallelized and have variance that is
inversely proportional to the minibatch size, generally leading to much faster
convergence. Additionally, we explore a connection with dropout: Gaussian
dropout objectives correspond to SGVB with local reparameterization, a
scale-invariant prior and proportionally fixed posterior variance. Our method
allows inference of more flexibly parameterized posteriors; specifically, we
propose variational dropout, a generalization of Gaussian dropout where the
dropout rates are learned, often leading to better models. The method is
demonstrated through several experiments
Auto-Encoding Variational Bayes
How can we perform efficient inference and learning in directed probabilistic
models, in the presence of continuous latent variables with intractable
posterior distributions, and large datasets? We introduce a stochastic
variational inference and learning algorithm that scales to large datasets and,
under some mild differentiability conditions, even works in the intractable
case. Our contributions is two-fold. First, we show that a reparameterization
of the variational lower bound yields a lower bound estimator that can be
straightforwardly optimized using standard stochastic gradient methods. Second,
we show that for i.i.d. datasets with continuous latent variables per
datapoint, posterior inference can be made especially efficient by fitting an
approximate inference model (also called a recognition model) to the
intractable posterior using the proposed lower bound estimator. Theoretical
advantages are reflected in experimental results
Understanding the Diffusion Objective as a Weighted Integral of ELBOs
Diffusion models in the literature are optimized with various objectives that
are special cases of a weighted loss, where the weighting function specifies
the weight per noise level. Uniform weighting corresponds to maximizing the
ELBO, a principled approximation of maximum likelihood. In current practice
diffusion models are optimized with non-uniform weighting due to better results
in terms of sample quality. In this work we expose a direct relationship
between the weighted loss (with any weighting) and the ELBO objective.
We show that the weighted loss can be written as a weighted integral of
ELBOs, with one ELBO per noise level. If the weighting function is monotonic,
then the weighted loss is a likelihood-based objective: it maximizes the ELBO
under simple data augmentation, namely Gaussian noise perturbation. Our main
contribution is a deeper theoretical understanding of the diffusion objective,
but we also performed some experiments comparing monotonic with non-monotonic
weightings, finding that monotonic weighting performs competitively with the
best published results
An Introduction to Variational Autoencoders
Variational autoencoders provide a principled framework for learning deep
latent-variable models and corresponding inference models. In this work, we
provide an introduction to variational autoencoders and some important
extensions
- …